Robust Control of the Multi-armed Bandit Problem

نویسندگان

  • Felipe Caro
  • Aparupa Das Gupta
چکیده

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittens index that is the solution to a robust optimal stopping-time problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort than evaluating the indices of a non-robust MAB and at the same time provides a near-optimal solution to the robust project selection problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Robust Arm-Acquiring Bandit Problems

In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given projects (arms) to generate a reward depending on the arm played and its current state. The state process of each arm is modeled by a Markov chain and the transition probability is priorly known. The goal of the player is to maximize the expected total reward. One variant of the problem, the so...

متن کامل

Contributions to the Asymptotic Minimax Theorem for the Two-Armed Bandit Problem

The asymptotic minimax theorem for Bernoully twoarmed bandit problem states that the minimax risk has the order N as N → ∞, where N is the control horizon, and provides lower and upper estimates. It can be easily extended to normal two-armed bandit. For normal two-armed bandit, we generalize the asymptotic minimax theorem as follows: the minimax risk is approximately equal to 0.637N as N →∞. Ke...

متن کامل

Deviations of Stochastic Bandit Regret

This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1− 1/n, the regret of the policy is of order logn. They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This ...

متن کامل

Two-Armed Bandit Problem, Data Processing, and Parallel Version of the Mirror Descent Algorithm

We consider the minimax setup for the two-armed bandit problem as applied to data processing if there are two alternative processing methods available with different a priori unknown efficiencies. One should determine the most effective method and provide its predominant application. To this end we use the mirror descent algorithm (MDA). It is well-known that corresponding minimax risk has the ...

متن کامل

Multi-Armed Bandit Policies for Reputation Systems

The robustness of reputation systems against manipulations have been widely studied. However, the study of how to use the reputation values computed by those systems are rare. In this paper, we draw the analogy between reputation systems and multi-armed bandit problems. We investigate how to use the multi-armed bandit selection policies in order to increase the robustness of reputation systems ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014